Dirichlet draws are sparse with high probability 
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Abstract 

This note provides an elementary proof of the folklore fact that draws from a Dirichlet 
distribution (with parameters less than 1) are typically sparse (most coordinates are small). 



1 Bounds 

Let Dir(a) denote a Dirichlet distribution with all parameters equal to a. 

Theorem 1.1. Suppose n > 2 and (Xi,...,X n ) ~ Dir(l/n). Then, for any cq > 1 satisfying 
6cq In(n) + 1 < 3n, 



Pr 



i:Xi> 



1 



< 6cq ln(r 



> 1 - 
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The parameter is taken to be 1/n, which is standard in machine learning. The above theo- 
rem states that (with high probability) as the exponent on the sparsity threshold grows linearly 
(n , n~ 2 , n -3 , . . .), the number of coordinates above the threshold cannot grow faster than linearly 
(61n(n),121n(n),181n(n),...). 

The above statement can be parameterized slightly more finely, exposing more tradeoffs than 
just the threshold and number of coordinates. 

Theorem 1.2. Suppose n > 1 and c\,C2,c^ > with C2 ln(n) + 1 < in, and {X\, . . . , X n ) ~ 
Dvc{ci/n); then 



Pr[|{« : Xi > n~ C3 }\ < c 2 ln(n)] > 1 - — 



/3 



1 



The natural question is whether the factor ln(n) is an artifact of the analysis; simulation exper- 
iments with Dirichlet parameter a = 1/n, summarized in Figure la exhibit both the ln(n) term, 
and the linear relationship between sparsity threshold and number of coordinates exceeding it. 



The techniques here are loose when applied to the case a = o(l/n). In particular, Figure lb 
suggests a — 1/n 2 leads to a single nonsmall coordinate with high probability, which is stronger 
than what is captured by the following theorem. 

Theorem 1.3. Suppose n > 3 and (X\, . . . ,X n ) ~ Dir(l/n 2 ); then 

Pr[|{i : X t > n- 2 }\ < 5] > 1 - e 2/e ~ 2 - e~ 8/3 > 0.64. 
Moreover, for any function g : Z ++ — > M ++ and any n satisfying 1 < ln(g(n)) < 3n — 1, 



Pr[|{* : Xi > n- 2 }\ < \n(g(n))} > 1 - e 2 /^ 1 / 3 (-^j 
(Take for instance g to be the inverse Ackermann function.) 
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Figure 1: For each Dirichlet parameter choice a G {n^ 1 ,n~ 2 } and each number of dimensions n 
(horizontal axis), 1000 Dirichlet distributions were sampled. For each trial, the number of coordinates 
exceeding each of 4 choices of threshold were computed. In the case of a = n , these counts were 
then scaled by ln(n) to better coordinate with the suggested trends in Theorems 1 1 . 1 1 and 1.2 Finally, 
these counts values (for each (n, e)) were converted into quantile curves (25%-75%). 



2 Proofs 

Theorems |1.1| to |1.3| are established via the following lemma. 

Lemma 2.1. Let reals e € (0, 1] and a > and positive integers k, n be given with k + 1 < 3n. Let 
(X iy . . . , X n ) ~ Dir(a). Then 

Pr[\{i : Xi > e}\ < k] > 1 - e -na e -(fc+l)/3 _ e -4(fc+l)/9_ 

The proof avoids dependencies between the coordinates of a Dirichlet draw via the following 
alternate representation. Throughout the rest of this section, let Gamma(a) denote a Gamma 
distribution with parameter a. 



Lemma 2.2. (See for instance Balakrishnan and Nevzorov (2003, Equation 27.17).) Let a > and 
ii > 1 be given. If (X\, . . . ,X n ) ~ Dir(a) and {Yi}™ =1 are n i.i.d. copies o/Gamma(a), then 



Yi 



Before turning to the proof of Lemma 2.1 one more lemma is useful, which will allow a control 
of the Gamma distribution's cdf. 

Lemma 2.3. For any a > 0, c > 0, and z > 1, 

Pr[Gamma(a) < zc] < z"Pr[Gamma(a) < c]. 
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Proof. Since e zx < e x for every x > and z > 1, 



Pr[Gamma(a) < zc] 



1 
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2 Q Pr[Gamma(a) < c 
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Proof of Lemma \2.1\ Since z i— > Pr[Gamma(a) > z] is continuous and has range [0, 1], choose c > 
so that 



Pr[Gamma(a) > c] = Pr[Gamma(a) > c] = 



fc + 1 
3n 



(2.4) 



where {k + l)/(3n) < 1. By this choice and Lemma 2.3 

Pr[Gamma(a) < c/e] < £- Q Pr[Gamma(a) < c] = e~ Q (\ - < e- a e- {k+1 ^^ . (2.5) 

Now let { Yi}f = i be n i.i.d. copies of Gamma(a). Define the events 

A := [3i G [n] . Y l > c/e] and B := [\{i £ [n] : Y l < c}\ > n - k] . 

The remainder of the proof will establish a lower bound on Pr(A AB). To see that this finishes the 
proof, define S := J2i ^ii smce event A implies that S > c/e, it follows that Yi< c implies Yi/S < e. 
Consequently, events A and B together imply that Yi/S < e for at least n — k choices of i. By 
Lemma 2.2 it follows that Pr(^4 A B) is a lower bound on the event that a draw from Dir(a) has at 
least n — k coordinates which are at most e. 
Returning to task, note that 



Pr(^ A B) = 1 - Pr(^4 V ->B) > 1 - Pr{^A) - Pr (-.£). 
To bound the first term, by eq. ( |2.5[ ), 

Pr(^A) = Pr[Vi G [n] . < c/e] = Pr[Yi < c/e] n < e -»« e -(fe+i)/3. 
For the second term, define indicator random variables Zi := [Y$ > c], whereby 

fc+1 



(2.6) 



(2.7) 



E(Zi) = Pr[Z, ; = 1] = Pr[F, > c] = Pr[F, > c] = 



3n 



Then, by a multiplicative Chernoff bound (Kearns and Vazirani 1994 Theorem 9.2), 



Pr(-.B) = Pr[|{< G [n] : > c}| > fc + 1] = Pr 



> 3nE(Zi) 



< exp(-4riE(Z,)/3). (2. 



Inserting (2.7) and (2.8) into the lower bound on Pr(AA B) in (2.6) 



Pr(AAB) > l-e~ na e 



jw*_-(fc+l)/3 



-4(fc+l)/9 



Proof of Theorem\1.2\ Instantiate Lemma 2.1 with k = C2 hi(n), a — Ci/n, and e = n 



□ 
□ 
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Proof of Theorem ]! . 1\ Instantiate Theorem 1.2 with c\ = 1, c% = 6cq, C3 = cq, and note 



1 / 1 \ Co 1 / 1 \ 3 1/1 1 fl\ 3 \ 1 



;!/3 y n J ' e 4/9 y n J - n ca I gl/3 + e 4/9 ^2 j j " „^o ' D 

Proof of Theorem \1.3\ Define the function /(z) := z~ z over (0, oo). Note that f'(z) = — (ln(z) + 
l)z~ z , which is positive for z < 1/e, zero at z = 1/e, and neg ativ e thereafter; consequently, 



su Pz6(o oo) /( z ) = /(l/ e ) = e 1 ^ 6 - As such, instantiating Lemma 2.1 with e = n 2 , a 
and any /c < 3n — 1 gives 



n 2/n e -(fc+l)/3 _ e -4(fc+l)/9 



Pr[|{z : X, > n-' 2 }\ < k] > 1 

> 1 - e 2/e e~ (fc+1)/3 - e - 4 ( fe+1 )/ 9 . 

Plugging in k £ {5, \n(g(n))} gives the two bounds. □ 
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